Realizing Midcourse Penetration With Deep Reinforcement Learning

نویسندگان

چکیده

A midcourse maneuver controller is obtained using deep reinforcement learning to maintain the survivability of a ballistic missile. First, abstracted as Markov decision process (MDP) with an unknown system state equation. Then, formed by Dueling Double Deep Q (D3Q) neural network used approximate state-action value function MDP. In order make controller’s intelligence improved learning, space, action and instant reward MDP are customized. The uses real-time situation input outputs ignition states pulse motors. Offline training shows that can achieve optimal strategy’s convergence after approximately 65 hours. Online tests demonstrate ability avoid interceptor intelligently account for entry error. scenarios multiple random factors, achieved penetration probability 100% mean re-entry error less than 5000 m.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Reinforcement Learning with POMDPs

Recent work has shown that Deep Q-Networks (DQNs) are capable of learning human-level control policies on a variety of different Atari 2600 games [1]. Other work has looked at treating the Atari problem as a partially observable Markov decision process (POMDP) by adding imperfect state information through image flickering [2]. However, these approaches leverage a convolutional network structure...

متن کامل

Reinforcement Learning with Deep Architectures

There is both theoretical and empirical evidence that deep architectures may be more appropriate than shallow architectures for learning functions which exhibit hierarchical structure, and which can represent high level abstractions. An important development in machine learning research in the past few years has been a collection of algorithms that can train various deep architectures effective...

متن کامل

Deep Reinforcement Learning with Double Q-Learning

The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether this harms performance, and whether they can generally be prevented. In this paper, we answer all these questions affirmatively. In particular, we first show that the recent DQN algorithm, which combines Q-le...

متن کامل

Operation Scheduling of MGs Based on Deep Reinforcement Learning Algorithm

: In this paper, the operation scheduling of Microgrids (MGs), including Distributed Energy Resources (DERs) and Energy Storage Systems (ESSs), is proposed using a Deep Reinforcement Learning (DRL) based approach. Due to the dynamic characteristic of the problem, it firstly is formulated as a Markov Decision Process (MDP). Next, Deep Deterministic Policy Gradient (DDPG) algorithm is presented t...

متن کامل

Collaborative Deep Reinforcement Learning

Besides independent learning, human learning process is highly improved by summarizing what has been learned, communicating it with peers, and subsequently fusing knowledge from dierent sources to assist the current learning goal. is collaborative learning procedure ensures that the knowledge is shared, continuously rened, and concluded from dierent perspectives to construct a more profound...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2021

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2021.3091605